Why some red wines taste better than others? Just because the wine tasters say so or there is another way to tell. Can we tell what make great wine or bad wine from their chemical properties? And if yes, under what conditions the quality of red wines is the best.
This is what we are going to explore: relationship of chemical properties with wine quality.
The analysis included: data structure, statistical summary, distribution plots, box plots of each variables vs. quality, correlation matrix and scatter plots, final plots and data exploring the strong correlated variables, and reflections.
The data set using in this analysis can be found here https://s3.amazonaws.com/udacity-hosted-downloads/ud651/wineQualityInfo.txt.
##
## The downloaded binary packages are in
## /var/folders/4g/g4gbmv3s773813tqb3cbhgrr0000gp/T//Rtmpekgrpp/downloaded_packages
##
## The downloaded binary packages are in
## /var/folders/4g/g4gbmv3s773813tqb3cbhgrr0000gp/T//Rtmpekgrpp/downloaded_packages
##
## The downloaded binary packages are in
## /var/folders/4g/g4gbmv3s773813tqb3cbhgrr0000gp/T//Rtmpekgrpp/downloaded_packages
##
## The downloaded binary packages are in
## /var/folders/4g/g4gbmv3s773813tqb3cbhgrr0000gp/T//Rtmpekgrpp/downloaded_packages
## [1] "/Users/thuy/Google Drive/Data-analysis-with-R"
First, let’s see the total of the wine data is:
## [1] 1599
samples.
Then, let’s explore the all variables.
## [1] "X" "fixed.acidity" "volatile.acidity"
## [4] "citric.acid" "residual.sugar" "chlorides"
## [7] "free.sulfur.dioxide" "total.sulfur.dioxide" "density"
## [10] "pH" "sulphates" "alcohol"
## [13] "quality"
X is data entry number and quality is the output of the analysis. So, there were 11 total variables. The data is in wide format.
How is about the structure of the data?
## 'data.frame': 1599 obs. of 13 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
Quality was measured as factor integer. All other variables were numerical data.
Statistical summary of the data was shown below.
## X fixed.acidity volatile.acidity citric.acid
## Min. : 1.0 Min. : 4.60 Min. :0.1200 Min. :0.000
## 1st Qu.: 400.5 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090
## Median : 800.0 Median : 7.90 Median :0.5200 Median :0.260
## Mean : 800.0 Mean : 8.32 Mean :0.5278 Mean :0.271
## 3rd Qu.:1199.5 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420
## Max. :1599.0 Max. :15.90 Max. :1.5800 Max. :1.000
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 0.900 Min. :0.01200 Min. : 1.00
## 1st Qu.: 1.900 1st Qu.:0.07000 1st Qu.: 7.00
## Median : 2.200 Median :0.07900 Median :14.00
## Mean : 2.539 Mean :0.08747 Mean :15.87
## 3rd Qu.: 2.600 3rd Qu.:0.09000 3rd Qu.:21.00
## Max. :15.500 Max. :0.61100 Max. :72.00
## total.sulfur.dioxide density pH sulphates
## Min. : 6.00 Min. :0.9901 Min. :2.740 Min. :0.3300
## 1st Qu.: 22.00 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500
## Median : 38.00 Median :0.9968 Median :3.310 Median :0.6200
## Mean : 46.47 Mean :0.9967 Mean :3.311 Mean :0.6581
## 3rd Qu.: 62.00 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300
## Max. :289.00 Max. :1.0037 Max. :4.010 Max. :2.0000
## alcohol quality
## Min. : 8.40 Min. :3.000
## 1st Qu.: 9.50 1st Qu.:5.000
## Median :10.20 Median :6.000
## Mean :10.42 Mean :5.636
## 3rd Qu.:11.10 3rd Qu.:6.000
## Max. :14.90 Max. :8.000
Quality was range from 3 to 8. Residual.sugar, chlorides, free.sulfur.dioxide and total.sulfur.dioxide had very large range of data. Do these variables influence wine quality?
First, let us explore the distributions of each variables using ggplot.
The data is in the format of wide data which make difficult for R to draw multiple variable plots. Therefore, I reshaped the data into long format.
# reshape data into long format
long_data <- melt(redwine, id.vars=c("X", "quality"))
Some of the variables seem to follow normal distribution such as density, pH, alcohol, volatile.acidity, sulphates and fix.acidity. Few others were right skewed distribution such as residual.sugar, free.sulfur.dioxide, total.sulfur.dioxide, sulphate, chloride.
Most of the wine samples had wine quality of 5 and 6. Let’s get the real number.
# calculate the % of wine with quality 5 and 6
100*count(subset(redwine, quality == 5 | quality == 6))/length(redwine$quality)
## n
## 1 82.48906
There was 82.49 % of wines had quality of 5 or 6.
Let us run the correlation matrix to see what chemical properties have strong relationships with wine quality and also with each others using ggpairs. It was difficult to plot ggpairs on all variables because the space allotted to the plot couldn’t hold 12^2 variables, so I created three groups and made sure that the variable “quality” (col 13) was presented in all.
We learned that any correlation above 0.3 is meaningful and 0.7 is pretty strong. Let us see if we could find any in the below results.
Correlation efficient between quality with volatile.acidity was -0.391, citric.acid with fixed.acidity was 0.672, citric.acid with volatile.acidity was -0.552.
Correlation efficient between total.sulfur.dioxide and free.sulfur.dioxide was 0.668.
Correlation efficient between quality and alcohol was 0.476, pH and density was -0.342.
From the above correlation analysis, I found only alcohol and volatile.acidity had correlation coefficients bigger than 0.3 with quality. Since we are interested in what make best wine, it is important to consider some other chemical properties which may have some impacts.
Let’s see the below results.
## [,1]
## fixed.acidity 0.12405165
## volatile.acidity -0.39055778
## citric.acid 0.22637251
## residual.sugar 0.01373164
## chlorides -0.12890656
## free.sulfur.dioxide -0.05065606
## total.sulfur.dioxide -0.18510029
## density -0.17491923
## pH -0.05773139
## sulphates 0.25139708
## alcohol 0.47616632
We could see that there were 6 chemical properties (volatile.acidity, total.sulfur.dioxide, pH, free.sulfur.dioxide, density, chlorides) have negative correlation with quality. It suggested that those chemical properties make wine taste worse. Among those properties, volatile.acidity had the most impact with correlation of -0.391. While sulphates, residual.sugar, fixed.acidity citric.acid, alcohol make wine taste better. Among those properties, sulphates, citric.acid, alcohol had the strongest impact with correlations of 0.251, 0.226 and 0.476 respectively.
From the box plots, it looked like alcohol, sulphates, volatile.acidity and citric.acid might have impacts on the quality of wines. The results were consistent with previous correlation analysis.
Let’s zoom the plots of these chemical properties up.
As the wine quality increase from 3 to 8, there was an increase in average of alcohol, except for quality of 5. We also could see that wine with quality of 5 has many outliers.
Let’s compare the distributions of alcohol for different wine qualities
The distribution of alcohol were similar and almost normal for all wine qualities except 5 where the distribution was much narrower.
Let’s see the summary of its alcohol.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.5 9.4 9.7 9.9 10.2 14.9
The mean of alcohol for quality of 5 was 9.89.
Let’s compare with other qualities
quality_vs_alcohol <- redwine %>%
group_by(quality) %>%
summarize(avg_alcohol = mean(alcohol)) %>%
arrange(avg_alcohol)
quality_vs_alcohol
## Source: local data frame [6 x 2]
##
## quality avg_alcohol
## (int) (dbl)
## 1 5 9.899706
## 2 3 9.955000
## 3 4 10.265094
## 4 6 10.629519
## 5 7 11.465913
## 6 8 12.094444
The average alcohol was increased from 9.955 to 11.094 (1.2 times) when wine quality increased from 3 to 8, except for quality of 5 where the average alcohol was 9.899.
As the wine quality increase from 3 to 8, there was an increase in average of citric.acid
Let’s compare the distributions of citric.acid for different wine qualities
We could see the mean of citric.acid shifted to the right with wine quality increased.
Let’s summary and arrange the mean of citric.acid
quality_vs_citric.acid <- redwine %>%
group_by(quality) %>%
summarize(avg_citric.acid = mean(citric.acid)) %>%
arrange(avg_citric.acid)
quality_vs_citric.acid
## Source: local data frame [6 x 2]
##
## quality avg_citric.acid
## (int) (dbl)
## 1 3 0.1710000
## 2 4 0.1741509
## 3 5 0.2436858
## 4 6 0.2738245
## 5 7 0.3751759
## 6 8 0.3911111
It was clearly to see the average value of citric.acid increased from 0.171 to 0.391 (2.3 times) when quality increased from 3 to 8.
As the wine quality increase from 3 to 8, there was an increase in average of sulphates.
Let’s compare the distributions of citric.acid for different wine qualities
We could see the distributions of sulphates were similar and the mean of sulphates shifted to the right with wine quality increased.
Let’s summary and arrange the mean of sulphates
quality_vs_sulphates <- redwine %>%
group_by(quality) %>%
summarize(avg_sulphates = mean(sulphates)) %>%
arrange(avg_sulphates)
quality_vs_sulphates
## Source: local data frame [6 x 2]
##
## quality avg_sulphates
## (int) (dbl)
## 1 3 0.5700000
## 2 4 0.5964151
## 3 5 0.6209692
## 4 6 0.6753292
## 5 7 0.7412563
## 6 8 0.7677778
It was clearly to see the average value of sulphates increased from 0.570 to 0.768 (1.3 times) when quality increased from 3 to 8.
As the wine quality increase from 3 to 8, there was an decrease in volatile.acidity.
Let’s compare the distributions of volatile.acidity for different wine qualities
We could see the distributions of volatile.acidity were similar and the mean of volatile.acidity shifted to the right with wine quality increased.
Let’s summary and arrange the mean of volatile.acidity
quality_vs_volatile.acidity <- redwine %>%
group_by(quality) %>%
summarize(avg_volatile.acidity = mean(volatile.acidity)) %>%
arrange(avg_volatile.acidity)
quality_vs_volatile.acidity
## Source: local data frame [6 x 2]
##
## quality avg_volatile.acidity
## (int) (dbl)
## 1 7 0.4039196
## 2 8 0.4233333
## 3 6 0.4974843
## 4 5 0.5770411
## 5 4 0.6939623
## 6 3 0.8845000
It was clearly to see the average value of volatile.acidity decreased from 0.884 to 0.404 (2.2 times) when quality increased from 3 to 8.
There were strong correlations among the chemical properties such as citric.acid with fixed.acidity (0.672), citric.acid with volatile.acidity (-0.552), total.sulfur.dioxide and free.sulfur.dioxide (0.668), and pH and density (-0.342).
There were also strong correlations of some chemicals with quality such as quality with volatile.acidity (-0.391), quality and alcohol (0.476), quality and sulphates (0.251), quality and citric.acid (0.226).
It is important to investigate multivariate analysis. As previous bivariate analysis, we found that some chemical correlated well with each others or with quality. In this section, we analyzed how our feature of interest - quality varies with other chemical properties.
In order to see simplify and see clearer relationships, I grouped the quality by their average chemical properties and add a new rating variable which groups the quality into three groups.
## Source: local data frame [6 x 12]
##
## quality avg_alcohol avg_citric.acid avg_sulphates avg_volatile.acidity
## (int) (dbl) (dbl) (dbl) (dbl)
## 1 5 9.899706 0.2436858 0.6209692 0.5770411
## 2 3 9.955000 0.1710000 0.5700000 0.8845000
## 3 4 10.265094 0.1741509 0.5964151 0.6939623
## 4 6 10.629519 0.2738245 0.6753292 0.4974843
## 5 7 11.465913 0.3751759 0.7412563 0.4039196
## 6 8 12.094444 0.3911111 0.7677778 0.4233333
## Variables not shown: avg_fixed.acidity (dbl), avg_pH (dbl),
## avg_residual.sugar (dbl), avg_density (dbl), avg_total.sulfur.dioxide
## (dbl), avg_free.sulfur.dioxide (dbl), avg_chlorides (dbl)
The above table showed the average value for each chemical properties for every wine quality.
Let’s see how the variables vary with quality and each others.
# reshape data into long format
long_data_avg <- melt(quality_vs_total_variables, id.vars=c("quality"))
# turn data in to data.table
wine_table <- data.table(redwine)
# add new rating variable
wine_table[, rating := ifelse(quality <=4, "bad",
ifelse(quality >=5 & quality <=6, "good",
ifelse(quality >=7, "very good", NA)))]
Let’s summarize the wine by rating.
wine_table %>%
group_by(rating) %>%
summarize(n_obs = n())
## Source: local data table [3 x 2]
##
## rating n_obs
## (chr) (int)
## 1 good 1319
## 2 very good 217
## 3 bad 63
So, there was 217 very good wines, 1319 good wines and 63 bad wines.
## Source: local data table [3 x 12]
##
## rating avg_alcohol avg_citric.acid avg_sulphates avg_volatile.acidity
## (chr) (dbl) (dbl) (dbl) (dbl)
## 1 bad 10.21587 0.1736508 0.5922222 0.7242063
## 2 good 10.25272 0.2582638 0.6472631 0.5385595
## 3 very good 11.51805 0.3764977 0.7434562 0.4055300
## Variables not shown: avg_fixed.acidity (dbl), avg_pH (dbl),
## avg_residual.sugar (dbl), avg_density (dbl), avg_total.sulfur.dioxide
## (dbl), avg_free.sulfur.dioxide (dbl), avg_chlorides (dbl)
The table show the average value of each chemical properties for each wine rating.
# reshape data into long format
long_data_avg_rating <- melt(rating_vs_total_variables, id.vars=c("rating"))
## Source: local data frame [6 x 3]
##
## avg_fixed.acidity avg_citric.acid quality
## (dbl) (dbl) (int)
## 1 8.566667 0.3911111 8
## 2 8.872362 0.3751759 7
## 3 8.347179 0.2738245 6
## 4 8.167254 0.2436858 5
## 5 7.779245 0.1741509 4
## 6 8.360000 0.1710000 3
We could clearly see the trend that the higher the wine rating the higher of both avg_fixed.acidity and avg_citric.acid were. Increasing average fixed.acidity from 7.78 to 8.57 and average citric.acid from 0.17 to 0.39 lead to increase wine quality from 4 to 8. It is supported that with both fix.acidity and citric.acid were strongly correlated with correlation coefficient of 0.672, and both chemicals were also correlated with quality with correlation of 0.124 and 0.226 respectively.
We could see the correlation of free.sulfur.dioxide and total.sulfur.dioxide but not with the quality. It was interesting to note that the wine quality was best with the middle range of both chemical properties (14 and 35 respectively).
## Source: local data frame [6 x 3]
##
## avg_free.sulfur.dioxide avg_total.sulfur.dioxide quality
## (dbl) (dbl) (int)
## 1 13.27778 33.44444 8
## 2 14.04523 35.02010 7
## 3 15.71160 40.86991 6
## 4 16.98385 56.51395 5
## 5 12.26415 36.24528 4
## 6 11.00000 24.90000 3
It was also noted that with the average total.sulfur.dioxide were similar in both bad wine and very good wine while the free.sulfur.dioxide were around 2 g/dm^3 higher in very good wine. When the both concentration of the chemicals increased further, the wine quality reduced. It was suggested that low concentration of the chemicals make wine taste bad, however too much of them (above 35 g/dm^3 for total.sulfur.dioxide, 14 g/dm^3 for free.sulfur.dioxide ) reduced wine quality.
pH and density was slightly correlated with each other but not with quality. Low concentration of both pH and density lead to higher quality. Higher pH seems reduce quality while it was not clear in density.
## Source: local data frame [6 x 3]
##
## avg_pH avg_density quality
## (dbl) (dbl) (int)
## 1 3.267222 0.9952122 8
## 2 3.290754 0.9961043 7
## 3 3.318072 0.9966151 6
## 4 3.304949 0.9971036 5
## 5 3.381509 0.9965425 4
## 6 3.398000 0.9974640 3
We could see that the density was changed in a range from 0.997 to 0.995 g/dm^3. It was very small range. So, it could say that density has very little impact on quality. And pH changed from 3.398 to 3.267 while wine quality increased from 3 to 8. So, we could conclude that pH and quality has negative correlation.
Sulphates and alcohol strongly correlated with each other. Increasing sulphates from 0.57 to 0.77 and alcohol from 9.96 to 12.09 lead to increase quality from 3 to 8.
## Source: local data frame [6 x 3]
##
## avg_sulphates avg_alcohol quality
## (dbl) (dbl) (int)
## 1 0.7677778 12.094444 8
## 2 0.7412563 11.465913 7
## 3 0.6753292 10.629519 6
## 4 0.6209692 9.899706 5
## 5 0.5964151 10.265094 4
## 6 0.5700000 9.955000 3
## Source: local data frame [6 x 3]
##
## avg_total.sulfur.dioxide avg_volatile.acidity quality
## (dbl) (dbl) (int)
## 1 33.44444 0.4233333 8
## 2 35.02010 0.4039196 7
## 3 40.86991 0.4974843 6
## 4 56.51395 0.5770411 5
## 5 36.24528 0.6939623 4
## 6 24.90000 0.8845000 3
Total.sulfur.dioxide and volatile.acidity were not correlated with each other. The total.sulfur.dioxide was low (around 25 to 35) for quality from 3-4 and 7-8. While volatile.acidity was strongly negative correlated with quality. The volatile.acidity was decreased from 0.84 to 0.42 while quality increased from 3 to 8.
pH and density was slightly correlated with each other but not with quality. Low concentration of both pH and density lead to higher quality. Higher pH seems reduce quality while it was not clear in density.
Total.sulfur.dioxide and volatile.acidity were not correlated with each other. Sulfur.dioxide was not correlated with wine quality while volatile.acidity was.
Some chemical correlated well with quality but not each others such as free.sulfur.dioxide and total.sulfur.dioxide. It was interesting to note that the wine quality was best with the middle range of both chemical properties (14 and 35 respectively).
Some chemical properties strongly correlated with each others and with wine quality, particularly:
Fixed.acidity and citric.acid strongly correlate with each other. Increasing average fixed.acidity from 7.78 to 8.57 and average citric.acid from 0.17 to 0.39 lead to increase wine quality from 4 to 8.
Sulphates and alcohol strongly correlated with each other. Increasing average sulphates from 0.57 to 0.77 and average alcohol from 9.96 to 12.09 lead to increase quality from 3 to 8.
We have explored the red wine data with many interesting questions about the data structures, data summary and how chemical properties vary with each others and with our feature of interest- quality. We have did statistical analysis and many different kinds of plots such as histogram, box plots, bar graph, etc. Let’s summarized the findings in there plots.
From plot 1, we could see that alcohol, citric.acid, fixed.acidity and sulphates positively influenced wine quality (green bar). Among those properties, sulphates, citric.acid, alcohol had the strongest impact with correlations of 0.251, 0.226 and 0.476 respectively.
Volatile.acidity, total.sulfur.dioxide, density, chlorides negatively influenced wine quality (red bar). Among those properties, volatile.acidity had the strongest impact with correlation of -0.391.
After finding alcohol and volatile.acidity have strongest impacts on wine quality. Let’s summarize their relationships with wine quality. The below plots were selected and improved from bivariate plots section.
Statistical summary of average alcohol and volatile.acidity vary with quality:
## Source: local data frame [6 x 3]
##
## quality avg_alcohol avg_volatile.acidity
## (int) (dbl) (dbl)
## 1 3 9.955000 0.8845000
## 2 4 10.265094 0.6939623
## 3 5 9.899706 0.5770411
## 4 6 10.629519 0.4974843
## 5 8 12.094444 0.4233333
## 6 7 11.465913 0.4039196
Increasing volatile.acidity from 0.40 to 0.88 significantly reduced wine quality from 8 to 3, while increasing alcohol from 9.96 to 11.47 increased wine quality from 3 to 8. This results were consistent with the correlation findings where volatile.acidity had correlation coefficient of -0.391 while alcohol’s was 0.476. I suggested to use the two chemical properties as main features for quality predicting model.
Next, let’s see among the chemical properties there was any strong correlations with each others and also with quality. The below plots were selected and improved from the multivariate plots section.
It was noted that I grouped the wine quality into 3 groups: bad (quality of 3 and 4 quality), good (quality of 5 and 6) and very good (quality of 7 and 8).
Statistic summary of average alcohol, sulphates, citric.acid and fixed.acidity vary with quality.
## rating avg_alcohol avg_sulphates avg_citric.acid avg_fixed.acidity
## 1 bad 10.21587 0.5922222 0.1736508 7.871429
## 2 good 10.25272 0.6472631 0.2582638 8.254284
## 3 very good 11.51805 0.7434562 0.3764977 8.847005
We found that:
Fixed.acidity and citric.acid strongly correlate with each other. Increasing average fixed.acidity from 7.87 to 8.85 and average citric.acid from 0.17 to 0.38 lead to increase wine rating from bad to very good.
Sulphates and alcohol strongly correlated with each other. Increasing average sulphates from 0.59 to 0.74 and average alcohol from 10.21 to 11.51 lead to increase wine rating from bad to very good.
It was interesting to note that the four chemical properties were highly correlated with wine quality as showed in plot 1. Particularly, fixed.acidity, sulphates, citric.acid, alcohol had the strongest impact with correlations of 0.124, 0.251, 0.226 and 0.476 respectively.
When we run modeling for predicting the quality we should careful select the features so two or three features are not too correlated.
The data has 11 chemical properties. In order to investigate all the variables, it could take some much time plotting individual graphs. Therefore, I researched on R library and also R blogger to learn to code in R. First, I iterated the histograms and box plots by writing a function that iterate all the variable names. It was succeed but still was too slow and complicated. Then, I read the paper “Tidy data” by Hadley Wickham. It changed everything. I reshaped the data into long format and applied facet_wrap function to iterate the variables with the histograms and box plots. It was fast and effective.
When I tried to use line plot to investigate the relationships of chemical properties and quality of wine, it was very hard to see since the data were spread out. However, when I use the box plot the trends of the data were much clearer.
All chemical properties had outliers and widely spread out made box plots not easy to see. Therefore, I used the average value and it was easier to identify which chemical properties may influence on the wine quality.
With the average value of variables, I could plot the relationship of the variables vs. quality. However, when alcohol concentration went down the wine quality went up from 4-5 which was not consistent with positive correlation coefficient of alcohol with wine quality. So, I group the wine quality into three group and add a new variable “rating”. The graph told better and clearer story.
In working with final plots, I have found which variables influence wine quality but I wanted to present it in clear way. I tried different plots and their combination. Finally, I came up with combine the correlation coefficients graphs and the relationship of selected variable with wine rating. It was a good summary plot since it told what I want to summarize: alcohol, citric.acid, sulphates and volatile.acidity influence wine quality.
Also, I understand that it seems hard to predict the quality of the wine base on physicochemical properties. As all kind of food and drink, its organic chemical properties are very important. Also, I found citric.acid, fixed.acidity, pH and density have strongly correlated with each others (see their correlation coefficients were higher than 0.6). Free.sulfur.dioxide and total.sulfur.dioxide were also strongly correlated. Therefore, I recommend further studies should pay attention to those variables.
From the overall summary and findings of the analysis, I recommend further study would using alcohol, citric.acid, sulphates and volatile.acidity to build a linear regression using those variable as inputs to predict red wine quality. Since alcohol and sulphates were well correlated and also as citric.acid with fixed.acidity, I should be careful on those features to avoid errors.